Annotated Trees and their Applications to XML Compression

نویسندگان

  • Tomasz Müldner
  • Jan Krzysztof Miziolek
  • Tyler Corbin
چکیده

Permutation based XML-conscious compressors permute the input document to improve the compression ratio and support efficiency of operations, such as queries or updates. One such compressor, XSAQCT, uses the properties of the permuted document, called an annotated tree, to these operations. This paper provides the formal background for the definition of an of D. It also provides an algorithm for creating an annotated tree for the XML document and its reverse algorithm, and discusses a measure of compressibility using an annotated tree. The theoretical and algorithm approaches are followed by the experimental results showing compressibility of annotated trees and a general analysis of semi-structured data and XML compression.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

XML tree structure compression using RePair

XML tree structures can conveniently be represented using ordered unranked trees. Due to the repetitiveness of XML markup these trees can be compressed effectively using dictionary-based methods, such as minimal directed acyclic graphs (DAGs) or straight-line context-free (SLCF) tree grammars. While minimal SLCF tree grammars are in general smaller than minimal DAGs, they cannot be computed in ...

متن کامل

P´olya Urn Models and Connections to Random Trees: A Review

This paper reviews P´olya urn models and their connection to random trees. Basic results are presented, together with proofs that underly the historical evolution of the accompanying thought process. Extensions and generalizations are given according to chronology: • P´olya-Eggenberger’s urn • Bernard Friedman’s urn • Generalized P´olya urns • Extended urn schemes • Invertible urn schemes ...

متن کامل

Automized Generation of Typed Syntax Trees via XML

The XANTLR/TDOM project is an implementation of a “typed” XML[2] Document Object Model initially used to represent abstract syntax trees in a compiler project. Tree classes, SAX event receivers, visitor classes and DTD are automatically derived from a sparsely annotated ANTLR grammar. Mapping tag values onto the type system of the target language allows for the compilation of syntax, mostly yie...

متن کامل

Compressing and Filtering XML Streams

Information technology is widely adopting the use of XML for information exchange. As messaging standards migrate to XML, there is growing concern for the magnitude of messages compared to binary formatted messages. XML compression can help mitigate the risk of exceeding the capacity of current communication resources. However, it is critical that compression technologies do not hinder XML quer...

متن کامل

Updates on Grammar-Compressed XML Data

In this paper, we present updates on CluX, a grammar-based XML compression approach based on clustering XML sub-trees. We show that updates on CluX-compressed data can be performed faster than decompressing the data, loading it into main memory and compressing it. Furthermore, we show how to support fast multiple updates, e.g. performing 100 updates in parallel is more than 70 times faster than...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014